12/16/2019

Introduction:

Airbnb offers travellers someone’s home as a place to stay instead of a hotel. And people can rent out extra space in their own home through Airbnb and make money for allowing a guest to stay the night. Nowadays, more and more people join Airbnb and Airbnb awards the title of “Superhost” to its dependable hosts. Many hosts have high rating, but only a small fraction of them are super hosts.I am going to study what it takes to become a super host and find out what are the differences between super hosts and normal hosts.

Data Description:

The dataset is the Boston Airbnb open data from the Inside Airbnb website. (http://insideairbnb.com/get-the-data.html) The data contains three main tables: listings, reviews and calendar. I will mainly focus on listings, reviews data.

The listings table has 96 attributes including price(continuous), longitude (continuous), latitude (continuous), listing type (categorical), host information (textual), neighbourhood (categorical), ratings (continuous), summary of the room (textual) and so on.

The reviews table has 6 attributes: date (time), comment ID (discrete), listing ID (discrete), reviewer ID (discrete), reviewer name (textual) and comment (textual).

Data Cleaning:

Prior to any analysis, the data was cleaned by fixing weird NA value and transforming “t” and “f” value into “TRUE” and “FALSE”. The price data were transformed from factor data to numeric data. The listings dataset and the reviews dataset were merged into a new dataset.

Exploratory Factor Analysis:

In order to explore the differences between super hosts and normal hosts, there appear to be five main parts for analysis:
- 1. Host Information
- 2. Listing Information
- 3. Price Information
- 4. Booking Rules
- 5. Reviews Information

Part 1: Host Information Visualization

Figures in part 1 display EDA of host information. I explored the following attributes:
- Superhost
- Host Begin Year
- Host Response Time
- Host Response Rate
- Whether Host Has Profile Picture - Host Identity Verified

Superhost Distribution

Figures 1 shows the distribution of superhost. We can see the number of superhost is around one-third of the number of not superhost. Therefore, we can say that being a superhost is not an easy thing.

Host Begin Year Distribution

Figures 2 shows the distribution of host begin year. We can see there is a peak in 2014. It means in 2014, suddenly a large amount of people started to rent their house and became a new Airbnb host in Boston.

Host Response Time Distribution

Figures 3 shows the distribution of host response time. We can see most superhost reply within an hour.

Host Response Rate Distribution

Figures 4 shows the distribution of host response rate. We can see most superhost reply 100%.

Part 2: Listing Information Visualization

Figures in part 2 display EDA of listing information. I explored the following attributes:
- Neighbourhood
- Room Type
- Number of Accommodates
- Number of Bathrooms
- Number of Bedrooms
- Number of Beds
- Bed Type

Figures 7 shows the distribution of neighbourhood. Figures 8-12 are mosaic plots showing the proportion different attributes. Figures 13 shows the distribution of bed type. I will only show some of them that are interesting.

Neighbourhood Distribution

Figures 7 shows the distribution of neighbourhood. We can see Brighton and Dorchester have relatively higher superhost rate.

Room Type Distribution

Figures 8 shows the distribution of room type. It is interesting that “hotel room” room type has a high rate of super host, but common room types are “entire home/apt” and “private room”.

Number of Accommodates

Figures 9 shows the distribution of number of accommodatese. It is interesting that when the number of accommodates are even numbers, the rate of super host turns higher.

Number of Bathrooms

Comparing to 1 bathroom, a listing with 1.5 bathrooms more probably belongs to a super host. Same things happen to listings with 2 and 2.5 bathrooms. It means a 0.5 bathroom plays an important role in awarding the title of “Superhost”, but only a few listings have a 0.5 bathroom.

Part 3: Price Information Visualization

Figures in part 3 display EDA of price information. These figures shows the price of different attributes arranged by average rating. I explored the following attributes:
- Price
- Security Deposit
- Cleaning Fee
- Extra People Fee

Price vs. Rating

Figures 14 displays listings’ price arranged by average rating. We can see the super host points rest on the right bottom corner of the figure. It means the super hosts’ listings are cheap and in high rating.

Security Deposit vs. Rating

Figures 15 displays security deposit arranged by average rating. We can see the super host points rest on the right bottom corner of the figure. And for high rating listings (close to 100), high security deposit dose not affect “Superhost”.

Cleaning Fee vs. Rating

Figures 16 displays cleaning fee arranged by average rating. We can see the super host points rest on the right bottom corner of the figure. And for high rating listings (close to 100), high cleaning fee dose not affect awarding the title of “Superhost”.

Part 4: Booking Rules Visualization

Figures in part 4 shows limitation of booking the listings. I explored the following attributes:
- Minimum Nights
- Maximum Nights
- Cancellation Policy
- Whether Guest Profile Picture Is Require

Minimum Nights vs. Maximum Nights

Figures 18 displays minimum nights arranged by maximum nights. We can see that most super hosts set small minimum nights. Maximum nights seems not important.

Cancellation Policy Distribution

Figures 19 displays cancellation policy distribution. We can see that most super hosts set small minimum nights. Maximum nights seems not important. It is clear that higher superhost rate exists in the moderate cancellation policy bar.

Require Guest Profile Picture

Figures 20 displays the distribution of whether guest profile picture is require. We can see that most super hosts require guest profile picture.

Part 5: Review Information Visualization

Figures in part 5 display EDA of reviews information. I explored the following attributes:
- Number of Reviews
- Review Scores of Rating
- Review Scores of Accuracy
- Review Scores of Cleanliness
- Review Scores of Checkin
- Review Scores of Communication
- Review Scores of Location
- Review Scores of Value
- Number of Reviews Per Month

From the figures we can see that super hosts have larger number of reviews, higher average rating and larger number of reviews per month.

Average Number of Reviews

Number of Reviews vs. Average Ratin

Number of Reviews vs. Average Ratin

Maps Visualization

By December 4, 2019, there are over 3000 Airbnb listings in Boston. You can see the locations of superhosts’ listings (red) and normal superhosts’ listings (green). It is a big surprise that most superhosts’ listings do not locate in the center of Boston and downtown area.

Text Ming

In order to dig deeper into the data, I did text ming of the name of the listings, the description of the listings and did sentiment analysis of the reviews comments.

Text Ming of Name

Here shows text minging results of name of listings. It seems there is no difference in common words they use in the listings’ names between super hosts and normal hosts. From the Most Common Words of Name figures, we can see for super hosts, the most common joy word they use is “sunny”. While the most common joy word normal hosts use is “beautiful”. And many super hosts mention “comfort” and “safe” while normal hosts more focus on “luxury” and “perfect”. We can say that when super hosts name their houses, they will use meaningfu words instead of empty words.

Most Common Words of Name (Super Host)

Most Common Words of Name (Normal Host)

Most Common Joy Words of Name (Super Host)

Most Common Joy Words of Name (Normal Host)

Wordclouds of Name (Super Host)

Wordclouds of Name (Normal Host)

Text Ming of Description

Here shows text minging results of description of listings. Again, it seems there is no difference in common words they use in the listings’ description between super hosts and normal hosts.

Most Common Words of Description (Super Host)

Most Common Words of Description (Normal Host)

Most Common Joy Words of Description (Super Host)

Most Common Joy Words of Description (Normal Host)

Wordclouds of Description (Super Host)

Wordclouds of Description (Normal Host)

Text Ming of Comments

Here shows text minging results of comments. Again, it seems there is no difference in common words they use in comments between super hosts’ listings and normal hosts’ listings. The four colourful figures show sentiment analysis results. We can see for super hosts’ listings, positive words occur more often while negative words seldom appear in the comments comparing to normal hosts’ listings.

Most Common Words of Comments (Super Host)

Most Common Words of Comments (Normal Host)

Positive Words of Comments (Super Host)

Positive Words of Comments (Normal Host)

Negative Words of Comments (Super Host)

Negative Words of Comments (Normal Host)

Sentiment Wordclouds of Comments (Super Host)

Sentiment Wordclouds of Comments (Normal Host)

Conclusion

Although it is not easy to be a super host in Airbnb, we can gain sense of how to become a super host from these exploration results. In general, a super host should satisfy:
- Reply message within an hour with 100% response rate.
- The top two best locations are Brighton and Dorchester.
- The number of accommodates should be even number.
- It is better to have a 0.5 bathroom.
- The cancellation policy should be moderate.
- Use joyful and meaningful words as well as details in listing’s name and description.